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Abstract 

Financial news contains useful informa¬ 
tion on public companies and the market. 

In this paper we apply the popular word 
embedding methods and deep neural net¬ 
works to leverage financial news to pre¬ 
dict stock price movements in the market. 
Experimental results have shown that our 
proposed methods are simple but very ef¬ 
fective, which can significantly improve 
the stock prediction accuracy on a stan¬ 
dard financial database over the baseline 
system using only the historical price in¬ 
formation. 

1 Introduction 

In the past few years, deep neural networks 
(DNNs) have achieved huge successes in many 
data modeling and prediction tasks, ranging from 
speech recognition, computer vision to natural 
language processing. In this paper, we are inter¬ 
ested in applying the powerful deep learning meth¬ 
ods to financial data modeling to predict stock 
price movements. 


Traditionally neural networks have been used to 
model stock prices as time series for the forecast- 


ing purpose, such as in 

(Kaastra and Boyd, 1991 

Adya and Collopy, 1991 

Chan et al., 2000^ 

Sk- 

abar and Cloete, 2002 

Zhu et al, 2008 

1 . In these 


earlier work, due to the limited training data and 
computing power available back then, normally 
shallow neural networks were used to model var¬ 
ious types of features extracted from stock price 
data sets, such as historical prices, trading vol¬ 
umes, etc, in order to predict future stock yields 
and market returns. More recently, in the commu¬ 
nity of natural language processing, many meth¬ 
ods have been proposed to explore additional in¬ 
formation (mainly online text data) for stock fore¬ 
casting, such as financial news (|Xie et al., 2013 


Ding et al., 20141, twitters sentiments (Si et al.. 


2013t[Sret al., 2014 ), microblogs ( |Bar-Haim et al.. 


20111. For example, (Xie et al., 2013|l propose to 


use semantic frame parsers to generalize from sen¬ 
tences to scenarios to detect the (positive or neg¬ 
ative) roles of specific companies, where support 
vector machines with tree kernels are used as pre¬ 


dictive models. On the other hand, (Ding et al.. 


20141 propose to use various lexical and syntac¬ 


tic constraints to extract event features for stock 
forecasting, where they have investigate both lin¬ 
ear classifiers and deep neural networks as predic¬ 
tive models. 

In this paper, we propose to use the recent word 
embedding methods (Mikolov et al., 2013b I to se¬ 
lect features from on-line financial news corpora, 
and employ deep neural networks (DNNs) to pre¬ 
dict the future stock movements based on the ex¬ 
tracted features. Experimental results have shown 
that the features derived from financial news are 
very useful and they can significantly improve the 
prediction accuracy over the baseline system that 
only relies on the historical price information. 


2 Our Approach 

In this paper, we use deep neural networks (DNNs) 
as our predictive model, which takes as input the 
features extracted from both historical price infor¬ 
mation and on-line financial news to predict the 
stock movements in the future (either up or down). 

2.1 Deep Neural Networks 

The structure of DNNs used in this paper is a con¬ 
ventional multi-layer perceptron with many hid¬ 
den layers. An L-layer DNN consisting of L — 1 
hidden nonlinear layers and one output layer. The 
output layer is used to model the posterior proba¬ 
bility of each output target. In this paper, we use 
the rectified linear activation function, i.e., f{x) = 
max(0, x), to compute from activations to outputs 
in each hidden layer, which are in turn fed to the 








































next layer as inputs. For the output layer, we use 
the softmax function to compute posterior proba¬ 
bilities between two nodes, standing for stock-up 
and stock-down. 


2.2 Features from historical price data 

In this paper, for each target stock on a target date, 
we choose the previous five days’ closing prices 
and concatenate them to form an input feature vec¬ 
tor for DNNs: P = {pt- 5 ,Pt- 4 ,Pt- 3 ,Pt- 2 ,Pt-l), 
where t denotes the target date, and pm denotes the 
closing price on the date m. We then normalize all 
prices by the mean and variance calculated from 
all closing prices of this stock in the training set. 
In addition, we also compute first and second order 
differences among the five days’ closing prices, 
which are appended as extra feature vectors. 
For example, we compute the first order differ¬ 
ence as follows: AP = {pt- 4 ,pt- 3 ,Pt- 2 ,Pt-l) 
-{pt- 5 ,Pt- 4 ,Pt- 3 ,Pn- 2 )- In the same way, the 
second order difference is calculated by taking the 
difference between two adjacent values in each 
AP. Finally, for each target stock on a particular 
date, the feature vector representing the historical 
price information consists of P, AP and AAP. 


2.3 Financial news features 


In order to extract fixed-size features suitable to 
DNNs from financial news corpora, we need to 
pre-process the text data. For all financial articles, 
we first split them into sentences. We only keep 
those sentences that mention at least one stock 
name or one public company. Each sentence is 
labelled by the publication date of the original ar¬ 
ticle and the mentioned stock name. It is possi¬ 
ble that multiple stocks are mentioned in one sen¬ 
tence. In this case, this sentence is labeled several 
times for each mentioned stock. We then group 
these sentences by the publication dates and the 
underlying stock names to form the samples. Each 
sample contains a list of sentences that were pub¬ 
lished on the same date and mentioned the same 
stock or company. Moreover, each sample is la¬ 
belled as positive (“price-up”) or negative (“price- 
down”) based on its next day’s closing price con¬ 


sulted from the CRSP financial dafabase (Booth, 


20121. In the following, we introduce our method 


to extract three types of features from each sample. 

(1) Bag of keywords (BoK): We first select the 
keywords based on the recent word embedding 
methods in ( Mikolov et ah, 201 3at [Mikolov ~et| 
ah, 2013b|). Using the popular word2vec method 


from Googl^ we first compute the vector rep¬ 
resentations for all words occurring in the train¬ 
ing set. Secondly, we manually select a small set 
of seed words, i.e., nine words of {surge, rise, 
shrink, jump, drop, fall, plunge, gain, slump} in 
this work, which are believed to have a strong in¬ 
dication to the stock price movements. Next, these 
seed words are used to search for other useful key¬ 
words based on the cosine distances calculated be¬ 
tween the word vector of each seed word and that 
of other words occurring in the training set. Eor 
example, based on the pre-calculated word vec¬ 
tors, we have found other words, such as rebound, 
decline, tumble, slowdown, climb, which are very 
close to at least one of the seed words. In this way, 
we have searched all words occurring in training 
set and kept the top 1,000 words (including the 
nine seed words) as the keywords for our predic¬ 
tion task. Einally, a 1000-dimension feature vec¬ 
tor, called bag-of-keywords or BoK, is generated 
for each sample. Each dimension of the BoK vec¬ 
tor is the TFIDF score computed for each selected 
keyword from the whole training corpus. 

(2) Polarity score (PS): We further compute 


so-called polarity scores (Turney and Eittman, 
2003 [ Turney and Pantel, 2010[ | to measure how 
each keyword is related to stock movements and 
how each keyword applies to a target stock in 
each sentence. To do this, we first compute the 
point-wise mutual information for each keyword 

w. PMl{w,pos) = log where 

freq(ri;,pos) denotes the frequency of the key¬ 
word w occurring in all positive samples, N de¬ 
notes the total number of samples in the train¬ 
ing set, freq(r(;) denotes the total number of key¬ 
word w occurring in the whole training set and 
freq(pos) denotes the total number of positive 
samples in the training set. Eurthermore, we cal¬ 
culate the polarity score for each keyword w as: 
PS(r(;) = PMI(t(;,pos) — PMl{w,neg). Obvi¬ 
ously, the above polarity score PS(t(;) measures 
how (either positively or negatively) each keyword 
is related to stock movements and by how much. 

Next, for each sentence in all samples, we need 
to detect how each keyword is related to the men¬ 
tioned stock. To do this, we use the Stanford 


parser (Mameffe et ah, 20061 to detect whether the 
target stock is a subject of the keyword or not. If 
the target stock is not the subject of the keyword 
in the sentence, we assume the keyword is oppo- 


'https://code.google.eom/p/word2vec/ 
























sitely related to the underlying stock. As a result, 
we need to flip the sign of the polarity score. Oth¬ 
erwise, if the target stock is the subject of the key¬ 
word, we keep the keyword’s polarity score as it is. 
For example, in a sentence like “Apple slipped be¬ 
hind Samsung and Microsoft in a 2013 customer 
experience survey from Forrester Research”, we 
first identify the keyword slipped, based on the 
parsing result, we know Apple is the subject while 
Samsung and Microsoft are not. Therefore, if this 
sentence is used as a sample for Apple, the above 
polarity score of “slipped” is directly used. How¬ 
ever, if this sentence is used as a sample for Sam¬ 
sung or Microsoft, the polarity score of “slipped" 
is flipped by multiplying —1. 

Finally, the resultant polarity scores are mul¬ 
tiplied to the TFIDF scores to generate another 
1000-dimension feature vector for each sample. 

(3) Category tag (CT): We further define a list 
of categories that may indicate a specific event or 
activity of a public company, which we call as cat¬ 
egory tags. In this paper, the defined category 
tags include: new-product, acquisition, price- 
rise, price-drop, law-suit, fiscal-report, invest¬ 
ment, bankrupt, government, analyst-highlights. 
Each category is first manually assigned with a 
few words that are closely related to the category. 
For example, we have chosen released, publish, 
presented, unveil as a list of seed words for the cat¬ 
egory new-product, which indicates the company’s 
announcement of new products. Similarly, we use 
the above word embedding model to automatically 
expand the above word list by searching for more 
words that have closer cosine distances with the 
selected seed words. In this paper, we choose the 
top 100 words to assign to each category. 

After we have collected all key words for 
each category, for each sample, we count the 
total number of occurrences of all words un¬ 
der each category, and then we take the log¬ 
arithm to obtain a feature vector as H = 
(log , log ^ 2 , log iVs , ■ • •, log Nf), where Nc de¬ 
notes the total number of times the words in cate¬ 
gory c appear in a sample. 

2.4 Predicting Unseen Stocks via Correlation 
Graph 

There are a large number of stocks trading in the 
market. However, we normally can only find a 
fraction of them mentioned in daily financial news. 
Hence, for each date, the above method can only 



Figure 1: Illustration of a part of correlation graph 


predict those stocks mentioned in the news. In this 
section, we propose a new method to extend to 
predict more stocks that may not be directly men¬ 
tioned in the financial news. Here we propose to 
use a stock correlation graph, shown in Figure[T] to 
predict those unseen stocks. The stock correlation 
graph is an undirected graph, where each node rep¬ 
resents a stock and the arc between two nodes rep¬ 
resents the correlation between these two stocks. 
For example, if some stocks in the graph are men¬ 
tioned in the news on a particular day, we first 
use the above method to predict these mentioned 
stocks. Afterwards, the predictions are propagated 
along the arcs in the graph to generate predictions 
for those unseen stocks. 


(1) Build the graph: We choose the top 5,000 
stocks from the CRSP database ( [Booth, 2012| ) to 
construct the correlation graph. At each time, any 
two stocks in the collection are selected to align 
their closing prices based on the related dates (be¬ 
tween 2006/01/01 - 2012/12/31). Then we calcu¬ 
late the correlation coefficient between the closing 
prices of these two stocks. The computed correla¬ 
tion coefficient (between —1 and 1) is attached to 
the arc connecting these two stocks in the graph, 
indicating their price correlation. The correlation 
coefficients are calculated for every pair of stocks 
from the collection of 5,000 stocks. In this paper 
we only keep the arcs with an absolute correlation 
value greater than 0.8, all other edges are consid¬ 
ered to be unreliable and pruned from the graph, a 
tiny fraction of which is shown in Figure [T] 


(2) Predict unseen stocks: In order to predict 
price movements of unseen stocks, we first take 
the prediction results of those mentioned stocks 
from the DNN outputs, by which we construct a 
5000-dimension vector x. Each dimension of x 
corresponds to one stock and we set zeros for all 






unseen stocks. The above graph propagation pro¬ 
cess can be mathematically represented as a ma¬ 
trix multiplication: x' = Ax, where A is a sym¬ 
metric matrix denoting all correlation weights in 
the graph. Of course, the graph propagation, i.e. 
matrix multiplication, may be repeated for several 
times until the prediction x' converges. 

3 Dataset 


feature combination 

error rate 

price 

48.12% 

price - 1 - BoK 

46.02% 

price - 1 - BoK -i- PS 

43.96% 

price - 1 - BOK -i- CT 

45.86% 

price - 1 - PS 

45.00% 

price - 1 - CT 

46.10% 

price - 1 - PS -i-CT 

46.03% 

price - 1 - BoK -i- PS -i- CT 

43.13% 


The financial news data we used in this paper are 
provided by (Ding et ah, 20141 which contains 
106,521 articles from Reuters and 447,145 from 
Bloomberg. The news articles were published in 
the time period from October 2006 to December 
2013. The historical stock security data are ob¬ 
tained from the Centre for Research in Security 
Prices (CRSP) database (Booth, 20121. We only 
use the security data from 2006 to 2013 to match 
the time period of the financial news. Base on 
fhe samples’ publicafion dales, we splif fhe dalasel 
info Ihree sels: a Iraining sel (all samples be- 
Iween 2006-10-01 and 2012-12-31), a validafion 
sef (2013-01-01 and 2013-06-15) and a fesl sel 
(2013-06-16 fo 2013-12-31). The Iraining sef con- 
lainls 65,646 samples, fhe validation sel 10,941 
samples, and fhe fesl sel 9,911 samples. 


4 Experiments 

4.1 Stock Prediction using DNNs 

In fhe firsl sel of experimenfs, we use DNNs lo 
predicl stock’s price movemenl based on a vari¬ 
ety of fealures, namely producing a polar predic¬ 
tion of fhe price movemenl on nexf day (eilher 
price-up or price-down). Here we have framed a 
sel of DNNs using differenl combinalions of fea- 
lure vecfors and found lhaf fhe DNN sfruclure of 
4 hidden layers (wilh 1024 hidden nodes in each 
layer) yields fhe besl performance in fhe valida¬ 
tion sel. We use fhe hislorical price fealure alone 
to create fhe baseline and various fealures derived 
from fhe financial news are added on lop of if. We 
measure fhe final performance by calculaling fhe 
error rale on fhe fesl sef. As shown in Table [TJ 
fhe fealures derived from financial news can sig- 
nificanlly improve fhe prediction accuracy and we 
have obfained fhe besl performance (an error rale 
of 43.13%) by using all fhe fealures discussed in 
Secfions 12.21 and 12.31 


Table 1: Slock prediclion error rates on fhe fesl 
sel. 



Figure 2: Predicl unseen stocks via correlation 


4.2 Predict Unseen Stocks via Correlation 

Here we group all oulpuls from DNNs based on 
Ihe dates of all samples on fhe lesl sel. For each 
date, we create a vector x based on fhe DNN pre¬ 
diclion resulls for all observed stocks and zeros 


for all unseen stocks, as described in section 2.4 


Then, Ihe vector is propagated Ihrough Ihe corre¬ 
lation graph to generate anolher sel of stock move¬ 
menl prediclion. We may apply a Ihreshold on Ihe 
propagated vector to prune all low-confidence pre¬ 
dictions. The remaining ones may be used to pre¬ 
dicl some stocks unseen on Ihe lesl sel. The pre¬ 
diction of all unseen stocks is compared wilh Ihe 
aclual stock movemenl on nexl day. Experimenlal 
resulls are shown in Figure where Ihe left y- 
axis denotes Ihe prediction accuracy and Ihe righl 
y-axis denotes Ihe percenlage of stocks predicated 
oul of all 5000 per day under each pruning Ihresh¬ 
old. For example, using a large Ihreshold (0.9), we 
may predicl wilh an accuracy of 52.44% on 354 
exlra unseen stocks per day, in addition to predicl- 
ing only 110 stocks per day on Ihe lesl sel. 


5 Conclusion 


In Ihis paper, we have proposed a simple melhod 
to leverage financial news to predicl stock move- 
















ments based on the popular word embedding and 
deep learning techniques. Our experiments have 
shown that the financial news is very useful in 
stock prediction and the proposed methods can 
significantly improve the prediction accuracy on 
a standard financial data set. 
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